spectral initialization
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- (2 more...)
- Asia > China > Beijing > Beijing (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- North America > United States > Minnesota (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- (2 more...)
Learning single index model with gradient descent: spectral initialization and precise asymptotics
Non-convex optimization plays a central role in many statistics and machine learning problems. Despite the landscape irregularities for general non-convex functions, some recent work showed that for many learning problems with random data and large enough sample size, there exists a region around the true signal with benign landscape. Motivated by this observation, a widely used strategy is a two-stage algorithm, where we first apply a spectral initialization to plunge into the region, and then run gradient descent for further refinement. While this two-stage algorithm has been extensively analyzed for many non-convex problems, the precise distributional property of both its transient and long-time behavior remains to be understood. In this work, we study this two-stage algorithm in the context of single index models under the proportional asymptotics regime. We derive a set of dynamical mean field equations, which describe the precise behavior of the trajectory of spectral initialized gradient descent in the large system limit. We further show that when the spectral initialization successfully lands in a region of benign landscape, the above equation system is asymptotically time translation invariant and exponential converging, and thus admits a set of long-time fixed points that represents the mean field characterization of the limiting point of the gradient descent dynamic. As a proof of concept, we demonstrate our general theory in the example of regularized Wirtinger flow for phase retrieval.
Model-free algorithms for fast node clustering in SBM type graphs and application to social role inference in animals
Cloez, Bertrand, Cotil, Adrien, Menassol, Jean-Baptiste, Verzelen, Nicolas
Graphs have become extremely useful for representing a wide variety of systems in different contexts: biological, social, information... A basic attempt to study them may consist in partitioning the vertices of a graph into clusters that are more densely connected; that is commonly called community detection or graph clustering; see for instance [20, 1]. Community detection and clustering are central problems in machine learning and data science. In particular, the stochastic block model (SBM) [34, 25] has been widely used as a canonical model for community detection and as a building block for clustering with more structural assumptions. In its most general form, the SBM corresponds to a randomly weighted graph model where each node has an unobserved label and the probability of observing a given edge between two nodes depends only on the labels of the nodes under consideration.
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization
Many theoretical studies on neural networks attribute their excellent empirical performance to the implicit bias or regularization induced by first-order optimization algorithms when training networks under certain initialization assumptions. One example is the incremental learning phenomenon in gradient flow (GF) on an overparamerterized matrix factorization problem with small initialization: GF learns a target matrix by sequentially learning its singular values in decreasing order of magnitude over time. In this paper, we develop a quantitative understanding of this incremental learning behavior for GF on the symmetric matrix factorization problem, using its closed-form solution obtained by solving a Riccati-like matrix differential equation. We show that incremental learning emerges from some time-scale separation among dynamics corresponding to learning different components in the target matrix. By decreasing the initialization scale, these time-scale separations become more prominent, allowing one to find low-rank approximations of the target matrix. Lastly, we discuss the possible avenues for extending this analysis to asymmetric matrix factorization problems.
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
To Reviewer 1 (R1)
We thank all three reviewers for their constructive comments. We address them below one by one. Q1: what makes it nontrivial to extend the regularity condition and proof technique in [11] to Riemmanian optimization. The Grassmannian manifold is nonconvex, making the analysis more complex. We will incorporate these into a revised version of the manuscript.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Frequency-Constrained Learning for Long-Term Forecasting
Kong, Menglin, Zheng, Vincent Zhihao, Sun, Lijun
However, modern deep forecasting models often fail to capture these recurring patterns due to spectral bias and a lack of frequency-aware inductive priors. Motivated by this gap, we propose a simple yet effective method that enhances long-term forecasting by explicitly modeling periodicity through spectral initialization and frequency-constrained optimization. Specifically, we extract dominant low-frequency components via Fast Fourier Transform (FFT)-guided coordinate descent, initialize sinusoidal embeddings with these components, and employ a two-speed learning schedule to preserve meaningful frequency structure during training. Our approach is model-agnostic and integrates seamlessly into existing Transformer-based architectures. Extensive experiments across diverse real-world benchmarks demonstrate consistent performance gains--particularly at long horizons--highlighting the benefits of injecting spectral priors into deep temporal models for robust and interpretable long-range forecasting. Moreover, on synthetic data, our method accurately recovers ground-truth frequencies, further validating its interpretability and effectiveness in capturing latent periodic patterns.
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (2 more...)
A Smoothing Newton Method for Rank-one Matrix Recovery
--We consider the phase retrieval problem, which involves recovering a rank-one positive semidefinite matrix from rank-one measurements. A recently proposed algorithm based on Bures-Wasserstein gradient descent (BWGD) exhibits superlinear convergence, but it is unstable, and existing theory can only prove local linear convergence for higher rank matrix recovery. We resolve this gap by revealing that BWGD implements Newton's method with a nonsmooth and nonconvex objective. We develop a smoothing framework that regularizes the objective, enabling a stable method with rigorous superlinear convergence guarantees. Experiments on synthetic data demonstrate this superior stability while maintaining fast convergence. Phase retrieval--the problem of recovering a real or complex signal from magnitude-only measurements--is a fundamental problem in signal processing. Its applications range from X-ray crystallography to astronomical imaging, where measurement systems capture a form of intensity [Harrison, 1993, Fienup, 1982, Fienup and Dainty, 1987, Miao et al., 1998]. The seemingly simple constraint of measuring magnitudes transforms what would be a linear problem into a challenging nonlinear and nonconvex optimization problem. More critically, the direct optimization formulation using least squares yields a nonconvex objective function, making it difficult to solve effectively. The phase retrieval problem is a specific instance of a broader class of low-rank matrix sensing problems that arise throughout signal processing [Recht et al., 2010].